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Abstract 

I present a summary of recent algorithmic developments for lattice 
field theories. In particular I give a pedagogical introduction to the 
new Multicanonical algorithm, and discuss the relation between the 
Hybrid Overrelaxation and Hybrid Monte Carlo algorithms. I also 
attempt to clarify the role of the dynamical critical exponent z and 
its connection with "computational cost." 
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1 INTRODUCTION 



In this review I shall concentrate on two areas in which there has been con- 
siderable work and significant progress during the last year. The first is the 
Multicanonical algorithm which, although the underlying ideas have been 
known for some time, has been successfully applied to studying many prop- 
erties of first-order transitions in lattice field theories. I shall attempt to 
give a somewhat pedagogical introduction to this method. The second is 
the Hybrid Overrelaxation algorithm; this method has been used for some 
years, but recent analysis of its performance for the Gaussian model and of 
its relationship with the Hybrid Monte Carlo algorithm are illuminating. 

Unfortunately there has been little activity or progress on what I consider 
the two outstanding algorithmic challenges facing lattice field theory today: 
dynamical fermions and complex actions. For the former the algorithms seem 
to work quite well, albeit slowly, whereas for the latter the situation is much 
worse. 

There has been a lot of work on multigrid and cluster methods. Both have 
been shown to work well in some simple models (usually in two dimensions), 
but their efficacy has not yet been demonstrated for four dimensional non- 
abelian gauge theories. Since these methods were discussed in considerable 
detail in previous lattice conferences I shall only survey the progress very 
briefiy here. 

I shall also try to clarify the situation regarding what is known about 
critical slowing down for the Hybrid Monte Carlo algorithm. 

2 MULTICANONICAL METHODS 

Although the application of the multicanonical algorithm to lattice field the- 
ory is new |^, 0, similar techniques have been used for some time 0, ||, |]. 
For a recent review of developments in this subject see |0. 

2.1 Marginal Distributions 

A statistical mechanical system or a quantum field theory may be described 
not only in terms of their detailed microscopic states, but also in terms of 
a set of macroscopic parameters (e.g., energy, density, magnetization) which 
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suffice to specify all of its thermodynamic properties. We shall denote the 
space of microstates by A4, which has "volume" Z = Jj^dcj) with respect to 
the natural measure d(f), and we shall write to indicate an average over 
Ai with respect to the measure oc dcf) f{(p). Similarly, we shall call the space 
of macrostates C, with volume Z' = dE and averages over this space with 
respect to the measure oc dE g{E) will be written as If ^ is some ther- 
modynamic observable which depends only on some macroscopic parameters 
S then the situation may be summarized by the following equation 

M^O^R. (1) 
Microscopic and macroscopic averages are related in a trivial way:[] 

{no£) = \ f d<j>n{sm (2) 

= ]- I dcj) [ dE 8{E - E{(l)))n{E) (3) 

Z J M Jo 

= I dEp{E)Q{E) = {{Q))^ (4) 
where we have introduced the density of states 

p{E)^\ f d<p6{E-S{<j>)). (5) 

The probability distribution generated by p{E) is the marginal distribution 
on O, meaning that it is obtained from the full distribution over the much 
larger space Ai by averaging over all the other variables. 



2.2 Importance Sampling 



Importance sampling is a fundamental technique to reduce the statistical 
errors in a Monte Carlo computation. Suppose we wish to measure the 
expectation value of some quantity / : 7W — M: to this end we introduce an 
estimator 



E,. 



1 ^ f{<Pt 



(6) 



^We use the notation f2 o f to indicate functional composition: (il o £){<f)) = ri(£(0)) . 
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where samples {(pt} are chosen from the distribution fi{(/)t)- It is easy to see 
that the expectation value {E^ [f / lA) ^ — if) independent of the choice of 
distribution in fact the central limit theorem tells us the stronger result 
that 

7 



{f) + o 



T 



(7) 



where the variance of a single sample is 




(8) 
(9) 



Observe that we can reduce the error not only by increasing T but also by 
choosing so as to make V^[f/fi] smaller. 

We may find the optimal importance sampling by a simple variational 
calculation 



S 



{v. 









/ / 



I A*opt 



0. 



The solution of this equation gives the optimal probability distribution 

I/I . 



A'opt — 



(I/I)' 



(10) 



this gives a variance per sample of 

/ 1 



{\f\y 



iff, 



(11) 



which vanishes iff / has the same sign \/(f). While we cannot often achieve this 
ideal situation, generating a probability distribution which approximates the 
operator being measured can greatly reduce the amount of computer time 
necessary to get a reliable measurement of the operator's expectation value. 
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2.3 Canonical Distribution 



In field theory as well as for statistical systems we are most often interested 
finding the expectation values of operators with respect to the canonical dis- 
tribution. For statistical systems this distribution is obtained by maximizing 
the entropy S = — (logP)p with respect to the probability distribution P 
subject to the constraints that the ensemble average is at a given point in O: 
(l)p = l, {S)p = EeO. 

±{S + X{l)p-(3{S)p}p^p^ = (12) 



whose solution is 



PM) = (13) 
with the partition function Z and free energy F given by 

Z{P)= [ rf0e-'5^('^)=e-^^('^), (14) 
Jm 

and 

E = -^hgZ{(3). (15) 
In terms of macroscopic averages the canonical distribution of Eq. ([T3|) is 

{n o £)^_,, = . (16) 

Taking configurations from the canonical distribution gives good impor- 
tance sampling if Q{E) ^ 1 for all E in the subspace of O of interest. 

It is important to note that we do not know Z{l3) a priori, so using any 
other importance sampling requires the computation of ratios of estimators. 
A ratio of unbiased estimators is not an unbiased estimator for the ratio, so 
care must be taken to avoid systematic errors; as we shall see, this ought to 
be done for multicanonical computations. 

2.4 Multicanonical Distribution 

If Q{E) has exponentially large peaks then important contributions may 
come from regions where e~^^p{E) is small. This situation is illustrated 
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Figure 1: A situation in which the multicanonical method is usefuL The 
sohd hne is the canonical distribution and the dashed hne is the operator of 
interest. 
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schematically in figure |1]. A compromise to get good importance sampling 
simultaneously for Qe~^^ and e~^^ is to generate configurations with a uni- 
form density on macrostate space O. This is the muUicanonical distribution 

It leads to {nog) ^ = {{[})) and 

(p o C il o t) 1 



pe 



(pe-n 



(19) 



2.5 Algorithms to Generate the 
Multicanonical Distribution 

If p{E) ^ p{E) then we can readily generate an approximate multicanonical 
distribution 

^-<^' « WW) 



for which 



pof e-^^ QoS 



(22) 



This is cheap if all the macroscopic parameters S are local operators on 
(j) E J^. We can use a Metropolis algorithm with acceptance probability 

P{^^^') = minfl,:S^£M^ (23) 

. / p{£m\ 



mm 



in(l,e-^'°*5P°^^ . (25) 



Presumably it would be better to use Hybrid Monte Carlo or Hybrid Over- 
relaxation algorithms if they have 

For low-dimensional O we may represent the approximate spectral density 
log p(-E') by binning. Berg and Neuhaus used a piecewise linear interpolation 
(canonical within each bin) 

N 

logp(E) = Y.xAE){ii,^ + «,) (26) 
i=i 

[Xj(£') is the characteristic function of bin y. it has the value 1 if E' is 
in the bin and otherwise], whereas Marinari and Parisi use a stochastic 
superposition of equiprobable linear interpolations (each term canonical) 

N 

~p{E) = Y.P,{(3,E + a,) (27) 
i=i 

[Pj is the probability of choosing term j from the "sum"]. The former is 
illustrated in figure |^. 



2.6 Applications 

There have been many calculations making use of the multicanonical method 
during the last year, including ||, ^, |10|, |1T], |T3|, [1^ [1^, |T6[. We shall just 
consider two very simple examples in which it is effective: 

• The surface energy at a first order phase transition, and 

• The autocorrelation time (tunnelling time) at a first order phase tran- 
sition. 

The surface energy may be defined by 

^ = 2F,.,_L^-^ (^ - oo), (28) 

-'min 

where Pmin and Pmax are illustrated in figure |^. In order to relate this to our 
formal analysis the appropriate operators may be expressed as 

Pma. = lim - log (e^^\\ ; (29) 

min 7— +±00 ^ W // 



Figure 2: A piecewise linear approximation to log p. 
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Figure 3: Schematic surface energy calculation. 
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these operators are shown (for some large but finite value of 7) with dashed 
lines in figure ^ The lack of overlap between the operator giving -Pmin cind 
the canonical distribution is obviously an example of the situation shown in 
figure m. For this reason it is clear why this operator will be sampled much 
better by the multicanonical algorithm, which will generate configurations of 
energy E uniformly distributed over the range of interest. 

The autocorrelation time (tunnelling time) for local Monte Carlo algo- 
rithms at a first order phase transition is notoriously long. The problem here 
is that any local algorithm requires the system to pass through the minimum 
separating the two phases (figure ||), so the tunnelhng time is approximately 
proportional to Pmin- Since the logarithm of the density of states is an ex- 
tensive quantity 

logp(E)ocL^ (30) 
where is the lattice volume, it follows that 

Ta OC Pmin ex e"^-^'', (31) 

where A is the error in the approximation for log p{E) at any fixed volume. 
For a true multicanonical distribution A = 0, and for an approximate one it 
is at least much smaller than the canonical barrier, as shown in figure ^. 

2.7 Open Questions 

While the multicanonical method works very well in practice, there are still 
several interesting algorithmic questions about it which ought to be ad- 
dressed. 

• For a fixed p{E) the autocorrelation time ta is still exponential in , 
but with a much smaller exponent. This follows simply from equa- 
tion dl]). 

• We can measure p{E) during the course of a multicanonical computa- 
tion by counting the number of configurations landing in each bin, and 
it can be used to improve the approximation p{E). 

• Just as for simulated annealing, however, no one has given an algorithm 
for evolving p{E). In order to do this one would have to address the 
following issues: 
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Figure 4: Another situation in which the multicanonical method is useful; 
the canonical distribution at a first-order phase transition. 
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— How do we distinguish a statistically significant difference between 
p and p from fluctuations? 

— How do we ensure stability, and that p ^ pi 

— The number of bins must presumably grow as to avoid an expo- 
nential autocorrelation time, yet there must be sufficient statistics 
in each bin to give a significant estimate for p. 

— Is there an algorithm giving a power-law volume dependence for 

— Do we care? In practical simulations we might not mind having 
an exponential algorithm provided that the exponent was small 
enough. We might even prefer it to a polynomial algorithm with 
a large coefficient. 



3 HYBRID OVERRELAXATION 



The name "Hybrid Overrelaxation" appears to have been coined by Wolff 
[p^ who recently analyzed the dynamical critical behaviour of this algorithm 
in the Gaussian model. The algorithm, however, has been used in a variety 



of models over the past few years |18, IS, 20, 21 



belongs to a class of overrelaxed algorithms introduced by Adler 



It 
with 



dynamical critical exponent ^ 1 for the Gaussian model p9, 30, pT|. 



3.1 Adler Overrelaxation 

Consider the Gaussian model defined by the free field action 



(32) 



A single-site Adler overrelaxation (AOR) update |28[ with parameter C, re- 
places 0(a;) by 



b\x) = (1 - C)0(x) + ^ + ViiL^ 



(33) 
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where ixP' = 2D + is the square of the highest frequency in the spectrum, 
the "force" on (j){x) due to its neighbours is JF = J2\x-y\=i 'Piv) y cind 77 is a 
Gaussian-distributed random number. 

3.2 z = 1 for Gaussian Model 

The lattice may be updated using a checkerboard scheme, alternating the 
update of all even and all odd sites. This is just a consequence of the locality 
of the action. This is also the basis of Wolff's fl^ analysis of the dynamical 
critical behaviour of overrelaxation in the Gaussian model: the fields on the 
even and odd sublattices can Fourier transformed, and equation (|33D becomes 
block diagonal in momentum space. Determining the autocorrelations of the 
method thus reduce to studying the eigenvalues of a 2 x 2 matrix. 

Let us now consider two special cases of the Adler overrelaxation algo- 
rithm, corresponding to different values of (. For C = 1 this is the heatbath 
(HB) algorithm, because (p'i^x) does not depend upon The exponential 

autocorrelation time is 

^ ip' - 0), (34) 



which corresponds to 2; = 2. If we adjust ( so as to minimize the autocorre- 
lation time we find that 



2m ns v D , ^ 

C = 2-^ + Om2, ^^^^^y. 35 

V-D 2m 



which gives z = 1. 

3.3 Hybrid Overrelaxation 

For C = 2 the new field value, 4>'{x), does not depend on the noise rj, so the 
update is not ergodic. The Hybrid Overrelaxation (HOR) algorithm cures 
this by alternating AOR steps with a single HB step. Minimizing the 
autocorrelation time requires increasing N (xl/m, which leads to Thor = 1- 



Just as pointed out by Weingarten and Mackenzie fS^, ^ for Hybrid 
Monte Carlo (HMC), A^ should be varied randomly around this value to 
avoid accidental non-ergodicity; although this is probably only a problem for 
the Gaussian model and not for interacting field theories. 
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3.4 Over relaxation and Hybrid 
Monte Carlo 

I now want to show that the HOR algorithm — and the AOR algorithm 
too — are closely related to the HMC algorithm. To this end lets us con- 
sider the HMC algorithm applied to the Gaussian model: we introduce the 
Hamiltonian 

i^(vr,0) = ^7r2 + 5(0) (36) 

on "fictitious" phase space. The corresponding equation of motion is 

(i)^ = -u'^cpx + ^, (37) 

whose solution in terms of the Gaussian distibuted random initial momentum 
Tlx and the initial field value (px is 

, , , , 1 — cos cut _ Tlx . 

(Pxit) = (pxcosut -\ JFH smut, (38) 

which is exactly the AOR update considered before if we identify ( = 1 — 
cosa;t. If we use this exact solution of the equations of motion to generate 
candidate configurations for HMC then the acceptance rate will be unity. 
For interacting field theories HMC provides an exact algorithm for any value 
of the overrelaxation parameter ( simply by dropping the interaction terms 
in the action and solving the equations of motion exactly for the resulting 
Gaussian model. 

Let us summarize the differences between this variant of the HMC algo- 
rithm and the usual one. For "conventional" HMC: 

• All sites are updated at once; 

• Each trajectory is of length r oc 1/m and is a sequence of many steps 
of length 6t] 

• There is a global Metropolis accept/reject step. 
For "local" HMC |3§: 

• Even and odd updates are alternated; 

• Each trajectory is of length r ^ tt/uj = 0(1); 

• There is a local Metropolis accept /reject step. 
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3.5 Leapfrog vs. Free Field Guidance 

It is important to realize that there are two separate issues; one is whether 
a global accept/reject step is used, the other is whether an exact solution to 
the free-field equations of motion or an approximate (leapfrog) solution to 
the true equations of motion is used. 

• Local Metropolis acceptance/rejection: 

— Useful only for local (bosonic) theories; 

— Acceptance rate does not depend on the lattice volume. 

• Free- field instead of leapfrog guidance: 

— Leapfrog has 5t errors so P^^ < 1 even for free field theory; 

— Free field guidance has errors of order A for interacting theories, 
whereas leapfrog has errors of order A^r^, where A is the coupling 
constant of the interaction part of the action. 



4 DYNAMICAL CRITICAL 
EXPONENTS 

Let us begin this topic by recalling the definition of the dynamical critical 
exponent z. It relates a "dynamical" property of the algorithm generating 
field configurations, the autocorrelation time ta (either the exponentional 
autocorrelation time or an integrated autocorrelation time will do), and a 
"static" property of the underlying field theory, the correlation length ^: 

r^cxT (e^oo). (39) 

One of the main "selling points" of algorithms is that they reduce (z ~ 1) 
or even eliminate {z ~ 0) critical slowing down from the value {z ~ 2) 
characteristic of "random walk" methods. In general terms it is possible to 
reduce z from 2 to 1 by using just the right amount of randomness in the 
algorithm (see sections |3.2| and [4.2|) , whereas further reduction of z seems 
PH I to require an algorithm with more specific "knowledge" of the dynamics 
of the theory (section ^) . 
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Although the theoretical analysis of critical slowing down is usually lim- 
ited to the Gaussian model, this is relevant because an algorithm is often 
better just because it is not quite so dumb in dealing with free fields. How 
close real theories are to free fields is an empirical question: for some in- 
teracting field theories the values of z have been determined empirically 



37, 38, 39, BO, 41 



4.1 Do we care too much about 

The characterization of algorithms by their dynamical critical exponent has 
perhaps been somewhat over-emphasized. Care is required not only because 
our computations are not always peformed at parameters where the asymp- 
totic behaviour of the algorithms has set in, but also because the cost of a 
computation is not just given by the autocorrelation time. 

We want to study the continuum limit (critical behaviour) of some lattice 
model in a large enough box (thermodynamic limit). In order that the sys- 
tematic errors are under control we need to match both the short and long 
distance behaviour of the lattice regularization to some analytic form: 

• For asymptotically free theories we hope we can match the ^ja ^ oo 
scaling to perturbation theory. 

• Non-perturbative effects fall off exponentially fast in UV asymptopia. 

• For non asymptotically free theories it is unclear how we can verify 
that a is small enough. 

• We hope to match results using finite size scaling for L/^ ^ oo. 

We only need to carry out numerical computations for an essentially finite 
range of lattice spacings. Good asymptotic dynamical scaling behaviour of 
an algorithm is useful only if 

• scaling sets in for the lattice sizes we are interested in, 

• the coefficient in equation (|39|) is small. 



4.2 Critical Slowing Down and Finite 
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Size Scaling 



Our definition of z (equation (p9D) assumes L ^ ^, whicli is usually the 
case for lattice gauge theory computations. If we want to study finite size 
scaling computations are sometimes done in the regime L <^ ^, and the 



dynamical critical exponent is defined by ta oc |^^. It is not clear that 
the definitions are equivalent if some parameters have to be tuned to values 
depending upon ^, e.g., r oc ^ in HMC For HMC in the Gaussian model 



with L > ^, 2; = 2 if r =constant, and 2; = 1 if r oc ^ [||, If ^ > L then 



z = 2 for r =constant, but I know of no analysis for any other cases. 
4.3 Computational Cost 

For the global HMC algorithm z is not the whole story; the computational 



cost explicitly depends on the lattice volume even at fixed ^ |45, 46 



7;.^, oc V^^/^ = L^^/^ (40) 

because the integration step size 5r has to be reduced so as to keep the 
Metropolis acceptance rate constant. It must be remembered that T^^^^ oc V 
is true even for local algorithms (one has to look at every site!). For higher- 
order leapfrog schemes this cost can be reduced to T^omp oc V^^'' (an interest- 
ing new way of understanding these higher-order algorithms is presented in 

0)- 

Before leaving this subject I would like to mention a couple of related 
topics. First, some interesting results on the scaling behaviour of HMC in 
the presence of dynamical fermions are presented in PSI, Second, an 



interesting variant of HMC which adds a small amount of randomness to the 
momenta at each integration step (instead of a complete momentum refresh- 
ment after an entire trajectory) was found by Horowitz ||50[. Unfortunately, 



the deterministic part of the momenta has to have its sign flipped after every 
step in order to satisfy the detailed balance condition, and this means the 
modified algorithm does not perform any better (although a detailed proof 
has not yet been presented). 



5 MULTIGRID AND 
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CLUSTER METHODS 



I mentioned previously that algorithms which have z < 1 appear to require 
some knowledge of the detailed dynamics of the model being simulated. For 
some spin models this is achieved in a non-trivial manner by cluster algo- 
rithms, but for lattice gauge theories like QCD all attempts in this direction 
are motivated by free field theory dynamics. This is not a bad idea, because 
such theories are asymptotically free. There has not been much activity in 
the area of Fourier acceleration, but there has been a great deal of work on 
Multigrid methods. Since such methods were reviewed in detail last year I 
shall merely survey work in this area extremely briefly. 

Grabenstein and Finn have presented a formalism for computing the ac- 
ceptance rate for Multigrid Monte Carlo |51, There have been numerous 



applications and tests of multigrid methods, including those of Hasenbusch 
and Meyer ||5^, ^ | 



(strictly speaking these are really "unigrid" algo- 
rithms) , Laursen and Vink , and Edwards et al. |^ . 

Multigrid methods are also under active study as a means of inverting the 
lattice Dirac operator more efficiently ^ |^. Another method 



used for this purpose has been proposed by Vyas |62|, |63[. Whether these 
techniques can significantly outperform the conjugate gradient algorithm is 
not yet settled. 

Multigrid methods have also been used to expedite gauge fixing (see 
also the work of van der Sijs |]65|). 

Cluster algorithms have also been under active study. An interesting 
way of understanding them has been suggested by Wolff [^]. Their area of 
applicability is still mainly for spin models of various kinds ||67| , |68| where 
they are extremely successful and are often the clear method of choice. Nev- 
ertheless, their use is limited to models whose spin manifold possesses an 
involutive isometry whose fixed point manifold has codimension one (i.e., a 
discrete quotient of a product of spheres), as was shown by Sokal and collab- 



orators 



B. Bunk 



describes a cluster algorithm which is applicable 
to Z(2) gauge theory. 

The parallelism inherent in FFT, multigrid, and cluster algorithms is 
more complex than the simple grid-like structure needed for the algorithms 
discussed in previous sections (they require only nearest-neighbour commu- 
nication with infrequent global summations). The implementation of cluster 
algorithms on parallel and vector computers has been discussed in several 
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papers |7l|, In the future it will be interesting to characterize these 

algorithms in terms of the type of communications network required to im- 
plement them efficiently on a parallel computer (e.g., combining networks, 
infinite dimensional grids, etc.). 



6 OTHER ALGORITHMS 

There are, of course, numerous interesting results which I have not had time 
or space to discuss in detail. I do wish to mention at least a few of them here, 
so that the interested reader will be able to refer to the original literature. 

An interesting algorithm called "microcanonical cluster Monte Carlo" was 
introduced by Creutz |7^. This algorithm allows a discrete spin variable to 
interact with a heat bath (provided by a family of "demons"), and for the 
demon energies to be refreshed occasionally. In some ways this method is a 
discrete analogue of the HMC method. 

Sloan, Kusnezov, and Bulgac pi| introduce a "chaotic molecular dynam- 



ics" algorithm, which couples chaotic degrees of freedom to a system to ensure 
ergodicity (instead of, or perhaps as well as, refreshing the fictitious momenta 
of the Hybrid algorithm). Neal [^] suggests a new procedure which may serve 
to increase the acceptance rate of the HMC algorithm. Finally, Bhanot and 



Adler [76 describe a parallel algorithm for updating spin models. 
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